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Imagination Developers Connection 

PowerVR Graphics - 
Latest Developments and 
Future Plans 


Latest Developments and Future Plans 


A brief introduction 

• Joe Davis 

• Lead Developer Support Engineer, PowerVR Graphics 

• With Imagination’s PowerVR Developer Technology team for ~6 years 



• PowerVR Developer Technology 

• SDK, tools, documentation and developer support/relations (e.g. this session © ) 
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Company overview 


About Imagination 

Multimedia, processors, communications and cloud IP 

Driving IP innovation with unrivalled portfolio 



Recognised leader in graphics, GPU compute and video IP 
#3 design IP company world-wide* 
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* source: Gartner 
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About Imagination 

Our IP plus our partners’ know-how combine to drive and disrupt 
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About Imagination 

Business model 




facebook.com/imgtec 


@PowerVRInsider | #idc15 


6 


CQ imagination 




About Imagination 

Our licensees and partners drive our business 
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Power VR Rogue Hardware 


Power VR Rogue 

Recap 



■ Tile-based deferred Tenderer 

■ Building on technology proven over 5 previous generations 

- Formally announced at CES 2012 

- USC - Universal Shading Cluster 

■ New scalar SIMD shader core 

■ General purpose compute is a first class citizen in the core ... 

■ ... while not forgetting what makes a shader core great for graphics 
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TBDR 

Tile-based 


■ Tile-based 

■ Split each render up into small tiles (32x32 
for the most part) 

■ Bin geometry after vertex shading into 
those tiles 

■ Tile-based rasterisation and pixel shading 

■ Keep all data access for pixel shading on 
chip 
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TBDR 

Deferred 


■ Deferred rasterisation 

■ Don’t actually get the GPU to do any pixel 
shading straight away 

■ HW support for fully deferred rasterisation 
and then pixel shading 

■ Rasterisation is pixel accurate 
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Conventional GPUs 

All surfaces filled 



PowerVR GPUs 

Only visible surfaces filled 
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TBDR 

Bandwidth savings 


■ Bandwidth savings across all phases of 
rendering 

■ Only fetch the geometry needed for the tile 

■ Only process the visible pixels in the tile 

- Efficient processing 

■ Maximize available computational resources 

■ Do the best the hardware can with bandwidth 
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TBDR 

Power savings 


( 



■ Maximizing core efficiency 

■ Lighting up the USC less often is always going to be a 
saving 


■ Minimizing bandwidth 

■ Texturing less is a fantastic way to save power 

■ Geometry fetch and binning is often more than 10% of 
per-frame bandwidth 

■ Saves bandwidth for other parts of your render 
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Rogue USC 

Architectural Building Block 


■ Unified Shading Cluster 

■ Basic building block of the Rogue architecture 

■ Laid out in pairs, with a shared TPU 

- 1, 0.5 and 0.25 USC designs are special 

■ Different balance in the design 

■ Tend to find their way into non-gaming applications 
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Unified Shading Cluster Array 

USCO 

Texture Unit 

USC1 

USCn-1 

Texture Unit 

USCn 
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Rogue USC 

Shader Architecture 



■ 16-wide in hardware 

- 32-wide branch granularity 

■ We run half a task/warp per clock 

■ Scalar SIMD 

■ Optimized ALU pipeline 

■ Mix of F32, FI 6, integer, floating point specials, logic 
ops 


Unified Shading Cluster Array 

usco 

Texture Unit 

usci 

USCn-1 

Texture Unit 

USCn 
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Rogue USC 

Pipeline datapaths 



■ Configurable in the IP core 

■ FI 6 paths were sometimes optional, thankfully not 
any more 

■ FI 6 paths performance increased significantly after 
the first generation 

- Performance in your shader 

■ F32 paths are dual FMAD 

■ FI 6 paths can do different things per cycle depending 
on shader 

■ ISA is available for you to interrogate though, with 
disassembling compilers 


Unified Shading Cluster Array 

usco 

Texture Unit 

usci 

USCn-1 

Texture Unit 

USCn 


R facebook.com/imgtec @PowerVRInsider | #idc15 


16 


CQ imagination 


Rogue USC 

Scalar 



■ Scalar ALUs 

■ Hard to understate what a benefit this is 

■ Seems obvious to do, right? 

■ Vector architectures are just hard to program well 

■ Scalar isn’t a free lunch 

■ Makes performance a lot more predictable for you 
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Rogue USC 

Programmable output registers 



■ The pixel output registers in the ISA are read/write 

■ One per pixel 

- Width depends on IP core 

■ We expose it programmatically with Pixel Local Storage 

■ Worked closely with ARM (thanks, Jan-Harald!) 
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Evolution 


Health Warning: Really Bad Diagrams™ 



Rogue Evolution 



- Architecture has changed quite a bit over time 

- Rogue in 2010 still mostly looks like a Rogue today 

- Significant evolutionary changes across the architecture 

■ Lots of it driven by developers before the IP is baked 

■ Lots of it driven by also analysing your stuff anyway 
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Extra low power GFLOPS 

* 


Supports both LDR and HDR ASTC 
formats 
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Series6 to Series6XT 

Lots of lessons learned 



■ Improved scheduler 

■ Streamlined ISA 

■ Improved compute task efficiency 
- Completely new FI 6 datapath 

■ Improved front-end for sustained geometry performance 

■ ASTC 


H facebook.com/imgtec @PowerVRInsider | #idc15 


23 


CQ imagination 


PowerVR Series7XT 
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Series6XT to Series7XT 

Adding features and smoothing off rough edges 



■ Changed how the architecture scales 

■ Improved USC 

■ Streamlined ISA 

■ Features 

■ Hardware tessellation 

■ DX1 1 -compliant USC (precision mainly) 

■ FP64 
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Into the future 



- Exciting changes being worked on across the architecture 

■ use 

■ Front-end 

■ Scaling 

■ Stuff you want! 


■ You can help 

■ We love feedback about the architecture and how it could best fit what you’re doing 

■ Don’t be shy 
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PowerVR Wizard 
Ray Tracing Update 


What is Ray Tracing? 



Ray tracing is the ability for the shader 
program for one object to be aware of the 
geometry of other objects. 
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PowerVR Architecture 








PowerVR Graphics Wizard Architecture 
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3 Unique Features of Wizard 


■ Fixed -function Ray-Box and Ray-Triangle testers 

■ Coherence-Driven Task-Forming and Scheduling 

■ Streaming Scene Hierarchy Generator 
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Fixed-Function Ray-Box and Ray-Triangle Testers 

44x Less Area for Box Testing 
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Coherence-Gathering 






Streaming Scene Hierarchy Generator 




What is Ray Tracing? 



Ray tracing is the ability for the shader 
program for one object to be aware of the 
geometry of other objects. 
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Hybrid Shadows, Augmented Production- Order-Independent Ambient Asset creation / 

Reflections, etc. Reality Quality Renders Transparency Occlusion compression 



Global Illumination Physics & Virtual Reality A.I.&Lineof Rapid photo- 


Collision Lens correction, Ultra-low latency Sight quality output 

Detection rendering, Lenticular Displays Calculations 
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< 


Ray Tracing Requirements 

Sustained Ray Throughput at 1080p, 60fps 

Technique vs Ray throughput 




Physics /Al/ In-Engine 
etc. Lightmap 
baking 


Hybrid, Hybrid, Soft Dynamic AO Interactive Gl, Lens Effects, Fully ray 
Reflections Shadows, 1 (Light e.g. DOF, AA, traced game 

light Probes) etc 
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PowerVR developer tools 


PowerVR Tools 


Asset Optimization 


blender 



N 


Development 



# eclipse 

* Visual Studio 



+ 

PVRGeoPOD 

PVRTexTool 



+ 

PVRVFrame 

PVRShaderEditor 

PVRShaman 



+ 

PVRTune 

PVRTrace 

PVRMonitor 




J V 




y 
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PowerVR Tools 

Release schedule 

• PowerVR Tools release process 

• Minor revision roughly every 6 months 


• Recent/upcoming releases 

• 3.5 SDK (April 2015) 

• 4.0 SDK (due September 201 5) 
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PVRT race 

What is PVRTrace? 



OpenGL ES API tracer 

■ OpenGL ES 1 .x, 2.0 and 3.x 
recording libraries 

■ GUI for analysis 


Features 

■ Inspect, analyse and playback 
captured data 


□ PVRTraceGUI 3.5 Shadowgun.pvrt 

Fil« fools View Help 
Stanc Analysis 
levd' Title 

O Enot eglGetConfigAlthb 

A W1 Possible VBO moddication whilst in use 

| A W2 Redundant gISossof call 

A W2 Redundant gIViewport call 

A W2 Face culling is disabled 

A W3 Uncompressed texture used 

A W3 Non custent shader unifonn 


?IQ 


Thread ID: Al 


»(GL_TRIANGLE_SIk 
ffer (GL_ELEMENT_ARRAY_BUFFER^ 


+ leglGettmw Q.aa.aCiaEg 

| gOr a w t lero e nts (GL.TOIANGLE.STRE> . 471 . &_UNSIGr«) .SHORT , NUU ) 
|g»hnd8uller(a._BBe(T_ARRAYJUFFBt , $750125) 

I glBuHrrSubOata ( GL .ELEMENT ARRA r .BUFFER , 0, 1916. --Oett( 1916 bytes)-) 


|*Btw»u«er(GL_ELB*mjtt«AY_BUFPBl , 8750125) 

■ (GL.ARRAY jurat . 20580294 ) 

| gb/ertexAttnbPoaiter (0,3, GL.FLOAT , GL.FALSE , 60 . 0x20 ) 
■ (2 , 3 . GLJTGAT , a JBISE . 60 . 0*2c ) 
'(3,2, Q._FtOAT , GL.FALSE , 60 , Os 3c ) 
(4 , 2 . a JIOAT . Gl J9HSE . 60 . 0*44) 



FferSubData (GL_ELEMENT_ARRAY_BUFFER ,tniiv - ■■■: 

1,{[ 1.00000010.954242:0.850746])) 

glBindBuffer (GL ELEMENT ARRAY BUFFER ,0) ^• {M000 ^ 

■ ^ { [ 0.000000; 0.000000; 0.000000 ] ) ) 

glBindBuffer (a_aEMENT_ARRAY_BUFFER , 8750125^tZ^Z^^^ 
glBindBuffer (GL_ARRAY_BUFFER , 20580294) 
glVertexAttribPointer (0,3, GL_FLOAT , GL.FALSE , ( 
glVertexAttribPointer (2,3, GL .FLOAT , GL.FALSE , ( 


|5 GL.FALSE , { [ L074712, 0.023293. 0.103365, 0... ) ) 
l, a .FALSE , { [ 0.994595, 0.012126. -0.103117. ... ) ) 
IP , 958 . Q..UNSIGFED .SHORT , NUU ) 
RAY.BUFFER, 8820 126) 

glVertexAttribPointer (3,2, GL.FLOAT , GL.FALSE , 6^^-*»^^jw'f».o.4i24.-oa«.(4i24bY«*)-) 

TrjWRAYJUFFBt.O) 

IT .ARRAY. BUFFER , 8820126) 



glVertexAttribPointer (4,2, GL .FLOAT , GL_FALSE 
l glUniformlfv (3, 1 ,{ 0.033342 > ) 

r (4 , 1 , { [ 1.000000; 0.954242; O.P’ 

ormlfv (5 , 1 , { 80.000000 > ) 

rl, {[0.1 


a _fl.Q*T , a JHSE . 60 , 0*20 ) 
’(2,3, GL.FIOAT . GL.FALSE . 60 , 0x2c ) 
'(3,2, GL.R.OAT , GL .fALSE , 60 , 0*3c ) 
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PVRT race 

New render state & data inspectors 
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Object Data Viewer 


0 


Texture 7980114: Call 487683 



- Parameters 


GL_TEXTURE_MIN_FILTER 

GL_UNEAR_MIPMAP_NEAREST 

GL_TEXTURE_MAG_FILTER 

GLJJNEAR 

GL_TEXTURE_WRAP_S 

GL.REPEAT 

GL_TEXTURE_WRAP_T 

GL_REPEAT 

GL_TEXTURE_WRAP_R 

GL_REPEAT 

GL_TEXTURE_BASE_LEVEL 

0 

GL_TEXTURE_COMPARE_FUNC 

GL.LEQUAL 

GL_TEXTURE_COMPARE_MODE 

GL.NONE 


GL_TEXTURE_SWIZZLE_R 

GL_RED 

GL_TEXTURE_SWIZZLE_G 

GL_GREEN 


GL_TEXTURE_2D 

L " Level: 0 -1024x1024 

Internal format GL_COMPRESSED_RGB... 

Vi " Level: 1 -512x512 

Internal format GL_COMPRESSED_RGB... 

W Level: 2 - 256 x 256 

Internal format GL_COMPRESSED_RGB... 

1 Level: 3 -128x128 
Internal format GL_COMPRESSED_RGB... 
Level: 4 - 64 x 64 

Internal format GL_COMPRESSED_RGB... 
Level: 5 -32x32 

Internal format GL_COMPRESSED_RGB... 
Level: 6 -16x16 

Internal format GL_COMPRESSED_RGB... 
Level: 7-8x8 

Internal format GL_COMPRESSED_RGB... 
Level: 8-4x4 

Internal format GL_COMPRESSED_RGB... 
Level: 9 -2x2 

Internal format GL_COMPRESSED_RGB... 
Level: 10-lxl 


Level: 0 - 1024 x 1024 


[ 34% (Fit) 



CQ imagination 




PVRTune 

What is PVRTune? 


PowerVR graphics core 
performance analyser 

■ GUI for analysis 

■ On-device server 


Features 

■ Real-time performance data 
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PVRTune 

Real-time GPU profiler 

• New counters 



• GPU clock speed, triangles culled, Hidden Surface Removal efficiency, SLC memory 
reads/writes and more 


• GUI changes 

• Simplified setup and navigation 

• Graphics and Compute modes 

• Tree view for counters (Overview, Tiler, Renderer etc.) 
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PVRShaderEditor 

Shader editor & offline profiler (with disassembly!) 


PVRShaderEditor 2.5 


File Edit View Source Help 




FragShader.fsh □ 


// Calculate the tex coords of the fragment (using it's position on the screen), noS 
lowp vec3 vAccumulatedNormal = vec3 (0 . 0, 0 . 0, 1 . 0) ; 
mediump vec2 vTexCoord = gl_FragCoord.xy * RcpWindowSize; 

// Test depth for fog 

lowp float fFogBlend = clamp (WaterToEyeLength * RcpMaxFogDepth, 0.0, 1.0); 
tifdef ENABLE_DISTORTION 

// When distortion is enabled, use the normal map to calculate perturbation 
vAccumulatedNormal = texture (NormalTex, BumpCoordO) . rgb; 
vAccumulatedNormal += texture (NormalTex, BumpCoordl) . rgb; 
vAccumulatedNormal -= 1.0; // Same as * 2.0 - 2.0 

lowp vec2 vTmp = vAccumulatedNormal . xy; 

/* 

Divide by WaterToEyeLength to scale down the distortion 
of fragments based on their distance from the camera 

*/ 

vTexCoord. xy -= vTmp * (WaveDistortion / WaterToEyeLength); 

#endif 

#ifdef ENABLE_RE FRACTION 

lowp vec4 vRef lectionColour = texture (Ref lectionTex, vTexCoord) ; 
lowp vec4 vRefractionColour = texture (RefractionTex, vTexCoord) ; 

#ifdef ENABLE_FRESNEL 

// Calculate the Fresnel term to determine amount of reflection for each fragme_ 


w 


Disassembled HW Code 

0 : fitr. pixel rO, drcO, cf4, cfO, 8; 

1 : wdf drcO 

2 : smp2d.fcnorm drcO, sh20, rO, sh4, _, rl2, 3; 

3 ; smp2d.fcnorm drcO, sh20, r2, sh4, r8, 3; 

4 : smp3d.fcnorm drcl, sh32, r4, shl6, _, rl5, 3; 

5 : frcp iO, r7 

6 ; sop rll.joutj, sh2.fl6.e0, iO, sub, cO, 0 

sop rl8.koutk, sh2.fl6.el, r7, sub, cO, 0 

7 : wdf drcO 

8 : sop il . fl6.e0 . joutj , r8, O.oneminus, add, rl2, O.onemii 

sop il.fl6.el.koutk, r9, O.oneminus, add, rl3, O.onemii 

sopmov is5, rl3 

9 : sop iO . fl6.e0 . joutj , rlO, O.oneminus, add, rl4, O.onem: 

sop iO . f 16. el . koutk, c64.neg, O.oneminus, add, il.fl6.i 
sopmov is5, il.fl6 


Fragment Shader: Compile succeeded. 


Profiling Settings 


Per-Line Cycle Estimate Total: 34 

Compiler: G6x00 
Version: REL/3.4@3147479 
Emulated Cycles: - 
Temporary Registers Used: - 
Primary Attributes Used: - 
Non-Dependent Texture Loads: - 
Global USC Instructions: 0 


Emulated Cycle Total: - 


[Line: 1 l|Col: 1 
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Rogue graphics driver 



Rogue graphics driver 

Release schedule 

• DDK (Driver Development Kit) release process 



• Reference driver source code released to PowerVR IP licensees 

• Minor revision roughly every 6 months 

• Top-tier customers engage early. Drivers in products shortly after official DDK release 


R facebook.com/imgtec @PowerVRInsider | #idc15 


48 


CQ imagination 


Rogue graphics driver 

1.4 DDK 

• Release date 

• Q4 2014 (release 1) 

• Q1 2015 (release 2) 



• OpenGL ES: Key features (release 1) 

• OpenGL ES 3.1 

• Compute shaders, shader storage buffer objects, draw indirect and more 


• OpenGL ES: Key features (release 2) 

• Android Lollipop support 
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Rogue graphics driver 

1.5 DDK 

• Release date 

• Q2/Q3 2015 



• OpenGL ES: Key features 

• Android Extension Pack (AEP) 

• ASTC, blend equation advanced, GPU shader model 5 and more 

• sRGB PVRTC 

• Pixel local storage 

• 128/256 bits per-pixel on-chip 
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Rogue graphics driver 

1.6 DDK 

• Release date 

• Q4 2015 



• OpenGL ES: Key features 

• Bicubic texture filtering 

• Shader group vote 

• Polygon offset clamp 

• Pixel local storage 2 

• Simultaneously write to pixel local storage and a framebuffer attachment 
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TM 
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Vulkan 

About 

• What is Vulkan? 



• New open standard API developed by the Khronos group 

• Designed for high-efficiency access to graphics and compute on modern GPUs 


• Key features 

• Minimizes driver overhead and enables multi-threaded GPU command preparation 

• Designed for mobile, desktop, console and embedded platforms 

• Designed for all GPUs - tile based GPUs are first-class citizens! 

• SPIR-V - binary intermediate language for shaders 
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Vulkan 

PowerVR driver status 

• PowerVR Vulkan driver 

• Driver development on-going 

• Working with key partners on initial content bring up 

• More details at SIGGRAPH 201 5 

• : Vulkan, OpenGL, OpenGL ES - 5:30 PM - 7:30 PM 
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PowerVR Graphics 

Future roadmaps 

• What drives our roadmaps? 

• Market analysis 

• Customer feedback 

• Developer feedback 
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Upcoming events 

idc-UK 

• Imagination Developers Connection 2015 UK 

• 1 st October, SOHO Hotel, London UK 

• Register here: http://imatec.com/idc/idc1 5-uk/ 



• Agenda 

• A full developer day including optimization tips, how to use ray tracing with raster 
graphics and more 

• Also includes guest talks from Google and Digital Legends 
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Questions? 


CD 

Imagination 


www.imgtec.com/idc 



